Skip to content

Remove clean markdown generation and artifacts.#833

Open
jottakka wants to merge 18 commits intomainfrom
cleanup/remove-clean-markdown
Open

Remove clean markdown generation and artifacts.#833
jottakka wants to merge 18 commits intomainfrom
cleanup/remove-clean-markdown

Conversation

@jottakka
Copy link
Contributor

@jottakka jottakka commented Feb 27, 2026

Stop generating and committing clean markdown files in CI, simplify markdown serving/indexing to use source content directly, and remove .md URL suffixes from llms output and markdown alternates for the current main behavior.


Note

Medium Risk
Removes the app/api/markdown endpoint, markdown rewrites in middleware.ts, and the committed public/_markdown artifacts, which can break any consumers relying on .md URLs or the old copy/export behavior. CI/build scripts are also simplified, so reviewers should verify markdown serving and LLM tooling still work end-to-end.

Overview
Removes the clean-markdown pipeline: deletes the generate-markdown GitHub Action, drops toolkit-markdown/generate:markdown/postbuild scripts, and removes the committed public/_markdown/** outputs.

Deletes app/api/markdown/[[...slug]] and strips middleware.ts + app/layout.tsx of .md routing/content-negotiation support (including text/markdown alternates), while updating the “Copy page” override to request markdown by fetching the current pathname with Accept: text/markdown.

Simplifies toolkit docs page actions by removing the custom copy button, and updates the llmstxt workflow to run on main pushes for English-doc changes and to request a team review on auto-generated PRs.

Written by Cursor Bugbot for commit ffcf096. This will update automatically on new commits. Configure here.

Stop generating and committing clean markdown files in CI, simplify markdown serving/indexing to use source content directly, and remove .md URL suffixes from llms output and markdown alternates for the current main behavior.

Made-with: Cursor
@vercel
Copy link

vercel bot commented Feb 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Mar 3, 2026 2:07am
docs (staging) Ready Ready Preview, Comment Mar 3, 2026 2:07am

Request Review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style Review

Found 1 style suggestion(s).

Powered by Vale + Claude

@jottakka jottakka self-assigned this Feb 27, 2026
/**
* Converts clean markdown to HTML for Pagefind indexing.
* This function expects pre-cleaned markdown (no MDX syntax).
* Converts markdown to HTML for Pagefind indexing.
*/
async function markdownToHtml(markdownContent: string): Promise<string> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing something but pagefind didn't need to convert to html since it's a postbuild step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh it called a api that required a html internaly, but it was replaced and this could be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it is needed ... so the pages can look better on the search result, otherwise it will show only flat text

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@teallarson, with the changes you did for Algolia, is this still required? We should not have this dependency.

…pagefind

Use pagefind's addCustomRecord API instead of converting markdown to HTML,
removing the remark/remark-rehype/rehype-stringify dependency. Extract
extractFrontmatterTitle and stripMdxSyntax as tested helpers.

Made-with: Cursor
addCustomRecord treats content as flat text, so markdown syntax like
[text](url) appeared raw in search result excerpts. Restore the
remark/rehype pipeline to convert stripped MDX to HTML before indexing.

Also strip JSX component tags (<Callout>, <Steps>, etc.) from MDX
before conversion for cleaner search content.

Made-with: Cursor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this whole api route/file, this is used by the copy button to generate the markdown but we can actually just fetch the current url with the headers no need to call this.

setLoading(true);
try {
const response = await fetch(`/api/markdown${pathname}.md`);
const response = await fetch(`/api/markdown${pathname}`, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't need to call an api, we can use fetch in the same url + path and just add the headers

jottakka and others added 2 commits March 2, 2026 17:52
…mstxt workflow

- Extract cleanMdxToMarkdown() into app/_lib/clean-mdx.ts with tests.
  Strips frontmatter, imports, exports, and JSX tags from MDX while
  preserving code blocks and standard markdown content.
- /api/markdown route uses cleanMdxToMarkdown for non-toolkit pages so
  content negotiation returns clean markdown instead of raw MDX.
- Copy buttons (copy-page-override, page-actions) now fetch the page URL
  with Accept: text/markdown instead of calling /api/markdown directly.
- llmstxt workflow re-enabled on PRs and pushes to main for MDX changes.

Made-with: Cursor
Cloudflare handles content negotiation and MDX-to-markdown conversion
at the edge. Remove the server-side cleanup function and its tests.

Made-with: Cursor
The previous SHA referenced a force-pushed commit that no longer exists,
causing the llmstxt workflow to fail its diff and skip regeneration.

Made-with: Cursor
Cloudflare handles content negotiation and markdown serving at the edge.
Remove the API route and all middleware code that rewired requests to it:
handleContentNegotiation, buildMarkdownPath, .md URL rewrites,
AI agent detection, and related helpers.

Made-with: Cursor
}

function SearchHit({ hit }: { hit: HitRecord }) {
function getHitUrl(hit: DocSearchRecord): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is here now. Maybe some conflict with main?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

The /api/markdown route was the only consumer of public/toolkit-markdown/.
With that route deleted, nothing reads the generated files. Remove:
- generate-toolkit-markdown.ts (script + test)
- pagefind-toolkit-content.ts (markdown formatter + test)
- toolkit-markdown build step from package.json

Made-with: Cursor
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants